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Abstract 

Texture  analysis  is  performed  on  muUibeam  sonar 
imagery.  A  set  of  fourteen  texture  features  is  com¬ 
puted  using  co-occurrence  matrices  to  form  the  fea¬ 
ture  space.  The  dimensionality  of  the  feature  space 
is  reduced  by  extracting  the  principal  components 
from  the  original  feature  space.  Classification  of 
the  image  is  performed  on  the  principal  compo¬ 
nents  using  K-Means  algorithm.  Results  indicate 
that  seafloor  bottom  types  can  be  characterized  by 
analyzing  the  texture  of  the  bathymetric  sonar  im¬ 
ages. 

1  Introduction 

Multibeam  echo  sounders  have  recently  been  used 
to  map  seafloor  with  high  resolution.  However, 
bathymetry  does  not  yield  other  seafloor  character¬ 
istics  such  as  bottom  type  and  seafloor  roughness. 
These  characteristics  can  be  inferred  from  the  fluc¬ 


to  this  14-dimensional  space  and  the  most  signifleant 
components  are  used  to  form  the  feature  space  for  the 
classifier.  A  simple  clustering  algorithm  is  used  to  clas¬ 
sify  the  seafloor  area  surveyed  into  “similar”  regions. 


2  Image  Construction  From  Sonar 

The  requisite  data  for  texture  analysis  are  pro¬ 
vided  by  a  multibeam  sonar  system.  Sonar  energy 
from  the  system  projector  array  located  on  the  ship’s 
hull  impinges  on  the  bottom  of  the  ocean  as  a  narrow 
beam  The  echo  is  received  by  an  array  of  hydrophones 
mounted  athwartships  (perpendicular  to  the  projec¬ 
tor).  Beamforming  is  performed  by  computing  a  256 
point  Fast  Fourier  TVansform  (FFT)  on  the  raw  data 
collected.  This  yields  an  array  of  return  intensities-an 
intensity  for  each  beamformer  bin  and  sample  time. 
The  sampling  rate  is  such  that  approximately  1000 
samples  are  obtained  for  each  bin.  This  is  the  data  on 
which  texture  analysis  is  performed. 


tuations  in  the  backscattered  acoustic  signal  [3].  Tex¬ 
ture  is  a  property  that  provides  information  about  the 
roughness  of  an  object.  In  this  paper,  we  attempt  to 
use  texture  analysis  to  extract  information  about  the 
roughness  of  the  seafloor,  and  classify  areas  with  sim¬ 
ilar  features  together. 

A  large  swath  of  seafloor  can  be  mapped  by  an¬ 
alyzing  the  backscattered  data  from  each  ping,  i.e., 
transmission  cycle.  The  data  from  each  ping  are  di¬ 
vided  into  256  parts  corresponding  to  256  directions. 
The  data  in  each  of  the  directions  form  a  bin.  A 
powerful  tool  to  extract  texture  information,  the  co¬ 
occurrence  matrix,  is  employed  on  the  sonar  image.  A 
co-occurrence  matrix  is  formed  for  the  data  in  each 
of  the  bins  in  a  ping.  A  set  of  14  texture  features  is 
then  computed  from  the  co-occurrence  matrix.  The 
results  of  the  texture  feature  extraction  are  combined 
to  form  a  single  14-dimensional  texture  feature  image 
data  set.  A  principal  components  transform  is  applied 


3  Texture  Analysis 

Even  though  a  precise  definition  of  texture  does 
not  exist,  image  texture  can  be  qualitatively  described 
as  having  one  or  more  properties  of  fineness,  coarse¬ 
ness,  smoothness,  granulation,  randomness,  lineation, 
or  being  mottled,  irregular,  or  hummocky  [13].  Basi¬ 
cally,  texture  refers  to  repetition  of  basic  texture  el¬ 
ements  called  texels.  A  texel  contains  several  pixels 
^hose  placement  could  be  periodic,  quasi-periodic  or 
random.  The  two  dimensions  of  texture  are  the  de¬ 
scription  of  these  texels,  and  the  spatial  distribution 
of  these  primitives.  A  texel  is  a  set  of  pixels  with  some 
common  tonal  feature  or  local  properties. 

There  is  a  close  relationship  between  tone  and  tex¬ 
ture.  Consider  a  small  area  of  an  image.  As  the  num¬ 
ber  of  distinguishable  tonal  properties  decreases,  the 


tonal  properties  will  predominate.  When  the  small- 
area  patch  is  the  size  of  one  pixel  so  that  there  is 
only  one  discrete  feature,  the  only  property  present  is 
simple  gray  tone.  As  the  number  of  distinguishable 
tonal  properties  increases,  the  texture  property  will 
predominate.  When  the  spatial  pattern  of  the  tonal 
primitives  is  random  and  the  gray  tone  varies  widely 
between  primitives,  a  fine  texture  results.  As  the  spa¬ 
tial  pattern  becomes  more  definite  and  and  the  tonal 
regions  increase  in  size,  a  coarser  texture  results.  Thus 
we  see  that,  to  characterize  texture,  equal  considera¬ 
tion  must  be  given  to  both  the  tonal  primitives  and 
the  spatial  dependence  between  the  primitives. 

A  number  of  approaches  to  analyzing  texture  have 
been  presented  in  the  literature.  A  comprehensive 
survey  of  the  basic  approaches  is  presented  by  Haral- 
ick  [13].  Some  of  the  methods  of  texture  analysis  are 
co-occurrence  matrices  [14],  gray  level  run  lengths  [9], 
Markov  models  [5],  [11],  structural  analysis  [13],  and 
fractal  analysis  [1]. 

Conners  and  Harlow  [17]  present  a  theoretical  com¬ 
parison  of  the  Co-occurrence  Method  (their  term  is 
Spatial  Gray  Level  Dependence  Method),  the  Gray 
Level  Run  Length  Method,  the  Gray  Level  Difference 
Method,  and  the  Power  Spectral  Method,  and  con¬ 
clude  that  the  Co-occurrence  Method  is  the  most  pow¬ 
erful  algorithm  for  texture  analysis.  Hence  our  choice 
of  co-occurrence  matrices  for  the  analysis  of  texture. 

Mastin  et  al.  used  co-occurrence  methods  on  SAR 
imagery  of  coastal  waters  for  obtaining  offshore  wind 
direction  and  for  the  estimation  of  aerodynamic  rough¬ 
ness  parameters  [4].  Aloimonos  addressed  the  problem 
of  determining  shape  from  texture  [7].  Haralick  et  al. 
tried  to  use  textural  features  of  photomicrographs  of 
sandstones  to  identify  the  type  of  rocks  and  applied 
textural  analysis  to  satellite  imagery  [14]. 

3.1  Co-occurrence  Matrices 

As  remarked  earlier,  knowledge  of  the  second  or¬ 
der  statistics  of  the  image  is  required  to  adequately 
describe  te.xture.  A  histogram  is  an  estimate  of  the 
first  order  statistics  of  an  image  (or  of  a  region).  The 
normalized  histogram  is  computed  as 

/’(«)  =  ^,*  =  0,1....,2‘-1  (1) 

where  N(i)  is  the  number  of  pixels  in  the  image  (re¬ 
gion)  with  intensity  value  i,  N  is  the  total  number  of 
pixels  in  the  image  (region),  and  b  is  the  number  of 
bits  per  pixel  in  the  image. 

The  analog  of  the  histogram  for  second  order  statis¬ 
tics  is  the  co-occurrence  matrix.  The  co-occurrence 


matrix  is  also  computed  in  a  “census”  fashion  by 
counting  pairs  of  occurrences  of  pixel  values  given  a 
certain  spatial  relationship  for  the  pair.  The  normal¬ 
ized  co-occurrence  matrix  elements  are  computed  as 

d,  6)  =  -  arjl  =  D{d,  0)  (2) 

for  pairs  of  pixels  at  locations  xi  and  having  inten¬ 
sity  values  t  and  j,  respectively.  The  distance  measure 
D(d,  0)  states  that  the  spatial  relationship  of  the  pair 
of  pixels  is  that  they  are  located  at  a  distance  magni¬ 
tude  d  apart  and  at  an  angle  0  (ot  0  +  x)  from  each 
other. 

A  complete  set  of  co-occurrence  matrices  would 
cover  all  values  of  d  and  0  over  a  meaningful  range. 
The  values  of  0  would  vary  between  0  and  x  using 
some  number  of  discrete  steps.  The  value  for  d  would 
range  from  1  up  to  some  distance  where  the  correlation 
between  pixels  is  still  significant. 

In  practice,  several  co-occurrence  matrices  are  com¬ 
puted  for  several  integral  values  of  d  and  for  four  values 
of  0,  0,  x/4,  x/2,  and  3x/4.  Figure  4  shows  several 
computed  co-occurrence  matrices  for  a  simple  example 
2  bit/pixel  image. 

One  of  the  disadvantages  of  the  co-occurrence  ma¬ 
trix  method  is  the  potentially  large  amount  of  data 
computed  for  different  pairs  of  d  and  0.  Only  four 
of  the  many  possible  co-occurrence  matrices  are  com¬ 
puted  in  Figure  4.  However,  co-occurrence  statistics 
are  powerful  in  that  they  are  invariant  under  mono¬ 
tonic  intensity  transformations  [15]. 

3.2  Texture  Analysis  of  Multibeam  Sonar 
Data 

A  co-occurrence  matrix  with  d  =  1  and  =  0  is 
formed  for  the  array  of  data  in  each  bin.  We  keep  0 
=  0  since  the  pings  are  not  georefernced  and  the  pro¬ 
cess  of  georeferencing  will  suppress  many  texture  at¬ 
tributes.  In  other  words,  it  is  assumed  that  the  data 
from  two  adjacent  pings  are  independent.  The  dis¬ 
tance  d  is  kept  small  because  we  expect  the  primitives 
to  be  relatively  small.  Haralick  et  al.  [14]  present  a  set 
of  14  texture  features  that  can  be  computed  from  co¬ 
occurrence  matrices.  The  meanings  of  some  of  the  fea¬ 
tures  are  also  presented.  Some  of  the  important  tex¬ 
ture  features  computed  are  angular  second  moment, 
contrast,  correlation,  inverse  difference  moment,  and 
entropy. 

We  compute  the  14  features  from  the  co-occurrence 
matrix  formed  for  each  ping  along  all  256  bins.  The 
corresponding  features  for  all  pings  are  concatenated 
to  form  a  14-dimensional  texture  feature  data  set.  A 


detailed  description  of  texture  feature  extraction  is 
provided  in  [6]. 

4  Data  Reduction 

The  magnitude  of  data  generated  by  texture  anal¬ 
ysis  is  inappropriate  for  classification  purposes.  We 
have  to  reduce  the  data  and,  at  the  same  time,  retain 
the  most  useful  part  of  the  data.  There  are  several 
methods  of  reducing  data.  A  very  common  approach 
employed  is  the  extraction  of  principal  components 
from  the  original  data. 

The  principal  components  transform  (also  known 
as  the  discrete  Karhunen-Loeve  transform  or  the 
Hotelling  transform),  is  used  to  transform  the  14- 
dimensional  data  set  into  another  feature  space  of  the 
same  dimension.  In  our  case,  each  pixel  in  the  tex¬ 
ture  image  is  represented  by  a  14-dimensional  vector, 
say  X.  We  have  a  total  of  283  x  197  (=55751)  such 
vectors.  The  mean  vector  and  covariance  matrix  of 
the  vectors  are  easily  estimated.  The  principal  com¬ 
ponents  transform  is  computed  using  the  equation 

y  =  A(x  -  mx)  (3) 

where  A  is  the  matrix  formed  from  a  sorted  set  of 
eigenvectors  of  the  covariance  matrix  Cx  and  mx  is 
the  mean  vector.  By  discarding  those  eigenvectors 
for  which  the  corresponding  eigenvalues  are  relatively 
small,  the  size  of  the  matrix  A  can  be  suitably  re¬ 
duced  to  make  y  a  vector  of  desired  dimension.  It  can 
be  shown  that  the  new  feature  space  is  one  in  which 
the  data  from  different  features  are  uncorrelated  and 
the  particular  choice  of  eigenvectors  retains  the  max¬ 
imum  possible  information  [12]. 

We  retain  the  first  four  components  of  the  trans¬ 
formed  data  and  make  it  the  feature  space  for  the 
classifier.  The  four  principal  component  images  are 
shown  in  Figure  5.  The  sorted  eigenvalues  of  the 
covariance  matrix  are  8805.451,  2.207,  1.376,  0.8799, 
0.1558,  0.04136,  0,02413,  1.124  x  lO-^,  7.709  x  lO"**, 
3.251  X  10-^  3.109  X  10-\  2.338  x  10-\  3.19  x  lO"®, 
and  1.748  x  10~®.  The  first  four  principal  components 
are  chosen  because  they  represent  the  entire  feature 
space  with  a  mean  square  error  oi  0.224104.  This  is 
computed  using  the  relationship 

j=i  ;=i 

where,  in  our  case,  n  =  14,  A  =  4,  and  A;’s  are  the 
eigenvalues. 


5  Classification 

Classification,  the  goal  of  pattern  recognition,  is  the 
process  of  assigning  each  of  the  objects  of  interest  to 
one  of  a  number  of  categories  or  classes.  The  objects 
of  interest  are  called  patterns.  Each  of  these  patterns 
is  represented  by  a  vector  of  dimension,  say,  n,  where  n 
is  the  number  of  features  used  to  represent  the  pattern. 
As  an  example,  consider  the  problem  of  recognizing  a 
digit  from  0  to  9.  Suppose  that  each  of  the  digits  is 
contained  in  a  grid  divided  into  n  small  squares  as  in 
Figure  1.  Then  one  way  to  form  a  feature  vector  is  to 
measure  the  area  occupied  by  the  digit  in  each-of  the 
small  squares. 


Figure  1:  The  digit  “1”  in  a  grid  of  25  squares 

Thus,  X  =  [xiX2  •  •  is  a  vector  representing  a 
digit,  where  Xi,X2,...,x„  are  the  areas  occupied  by 
the  digit  in  the  little  squares  1, 2, . . . ,  n  respectively. 
The  problem  of  pattern  recognition  is,  therefore,  to  as¬ 
sign  a  class  label  to  an  unknown  pattern,  or  a  random 
vector  in  the  feature  space.  A  function  which  sepa¬ 
rates  any  two  classes  is  called  a  discriminant  function 
and  a  network  which  classifies  a  pattern  based  on  the 
values  of  the  discrimant  functions  is  called  the  classi¬ 
fier  [8]. 

The  probability  of  misclassification  is  the  key  fac¬ 
tor  in  ^Lnalyzing  the  performance  of  any  classifier.  It 
is  well  known  that  the  optimal  classifier,  assuming  the 
distributions  of  the  random  vectors  are  known,  is  the 
Bayes  classifier  which  is  studied  under  statistical  hy¬ 
pothesis  testing  [8].  However,  the  implementation  of 
the  Bayes  classifier  is  difficult  because  of  its  complex¬ 
ity,  especially  when  the  dimensionality  is  high. 

If  there  exists  a  set  of  patterns,  the  class  assignment 
of  which  is  already  known,  the  process  of  classification 
is  called  supervised  classification.  A  portion  of  the 
set  of  labeled  patterns,  called  the  training  set,  is  used 
to  derive  a  classification  algorithm.  The  rest  of  the 
labeled  patterns  comprise  the  test  set  and  are  used 
to  test  the  classification  algorithm  and  evaluate  its 
performance.  Once  the  algorithm  is  tuned  to  provide 
the  desirable  performance,  it  can  be  used  on  initially 
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Figure  2:  A  clustering  example  XI 


uniabeled  patterns  [2]. 

Sometimes,  however,  we  may  not  have  a  set  of  la¬ 
beled  patterns  and  we  may  not  even  know  the  num¬ 
ber  of  classes.  The  problem  is  not  only  to  classify 
the  data,  but  also  to  define  the  classes.  Several  ap¬ 
proaches  to  this  problem  have  been  dealt  with  in  the 
literature  [16],  [10],  [19].  The  ensuing  discussion  is 
solely  concerned  with  a  procedure  of  seeking  clusters 
of  points  in  the  measurement  space  called  clustering. 

5.1  Clustering 

Clustering  is  the  process  by  which  samples  with 
“similar”  features  are  combined  together  to  form  a 
single  cluster.  The  fundamental  issue  in  the  cluster¬ 
ing  problem  is  the  definition  of  a  cluster  or,  equiv¬ 
alently,  the  choice  of  features.  There  are  two  ap¬ 
proaches  to  clustering,  the  parametric  approach  and 
the  nonparametric  approach.  Parametric  approaches 
require  either  clustering  criteria  to  be  defined  or  as¬ 
sume  a  mathematical  form  for  the  distribution  of  the 
samples.  Most  often,  a  clustering  criterion  is  defined 
and  samples  are  assigned  to  classes  such  that  this  cri¬ 
terion  is  optimized.  A  typical  example  of  assuming 
a  mathematical  form  for  the  distribution  is  the  prob¬ 
lem  of  finding  parameters  that  best  fit  the  data,  the 
distribution  of  which  is  assumed  to  be  a  summation 
of  normal  distributions.  On  the  other  hand,  nonpara¬ 
metric  approaches  separate  samples  according  to  the 
valley  of  the  density  function  [8].  Figure  2  shows  a  set 
of  two-dimensional  patterns  grouped  into  two  clusters 
using  a  distance  function  as  the  criterion. 

In  the  absence  of  ground-truth  information,  i.e.,  a 
training  set,  we  are  led  to  employ  a  clustering  algo¬ 


rithm  on  the  samples  in  the  multidimensional  tex¬ 
ture  feature  image.  Most  of  the  clustering  algorithms 
which  seek  to  optimize  a  clustering  criterion  are  itera¬ 
tive.  These  algorithms  are  not  guaranteed  to  converge 
and  even  if  they  do  converge,  they  may  converge  to  a 
local  minimum  rather  than  the  global  minimum.  A 
branch  and  bound  procedure  which  is  guaranteed  to 
find  the  global  minimum  is  given  in  [18].  This  algo¬ 
rithm,  however,  is  not  practicable  for  the  magnitude 
of  data  in  our  case.  A  simple  clustering  algorithm  [2] 
which  optimizes  a  criterion  iteratively,  is  given  below. 
This  is  followed  by  a  very  popular  algorithm  called 
the  K-Means  Algorithm  [2]  which  optimizes  a  specific 
criterion. 

5.2  A  Simple  Clustering  Algorithm 

Suppose  the  number  of  clusters  Ne  is  known.  Let 
X  denote  the  set  of  samples  to  be  classified 

and  11  an  ordered  set  of  class  labels  assigned  to  the 
samples.  Further  suppose  that  . . .  ,uf/Vc 

labels  and  Rf’*)  is  the  set  of  class  labels  at  the  rth 
iteration.  Assume  that  the  classification  is  optimal 
when  a  criterion  function  J(X,  R)  is  minimized.  The 
following  general  procedure  can  be  used  in  an  attempt 
to  minimize  J. 

1.  Choose  an  initial  classification  R°  and  compute  J. 

2.  Change  the  classification  in  a  way  that  tends  to 
decrease  J. 

3.  If  it  is  not  possible  to  decrease  J  in  step  2,  then 
stop;  else  go  to  step  2. 

Since  the  variables  in  this  optimization  problem  are 
the  class  labels  which  are  discrete,  gradient  search 
techniques  cannot  be  used.  One  way  to  solve  this 
problem  is  to  determine  the  change  in  the  class  label 
for  each  sample  that  would  result  in  the  greatest  de¬ 
crease  in  J  and  apply  these  changes  in  step  2.  Suppose 
that  Rf’’^  =  {w,, , where  N  is  the  number 
of  samples  in  X.  If  A  J,  is  the  largest  negative  change 
in  J  that  can  be  made  by  reclassifying  sample  x.^'\  and 
w,/,  is  the  corresponding  new  label  for  then  the 
new  set  of  labels  is  . . . , 

Observe  that  since  AJj  is  evaluated  by  making  one 
change  at  a  time  and  is  obtained  by  making  all 

changes  simultaneously  ,  the  change  in  the  value  of  J 

N 

'  is  not,  in  general,  equal  to  ^  AJj.  It  is  highly  likely, 

t=:l 

though,  that  the  criterion  function  has  decreased. 

5.3  The  K-Means  Algorithm 

This  algorithm  uses  a  similarity  measure  that  is  the 
Euclidean  distance  of  the  samples  and  a  criterion  J 


defined  by 

^  =  fl  H  (5) 

t  =  l  X(>>~wt 

where  the  second  sum  is  over  all  samples  in  the  ibth 
cluster  and  /n  is  the  “center”  of  the  cluster.  It  is 
easily  seen  that  for  a  fixed  set  of  samples  and  class 
assignments,  J  is  minimized  by  choosing  fn  to  be  the 
sample  mean  of  the  ibth  cluster.  Moreover,  when  ftk 
is  the  sample  mean,  J  is  minimized  by  assigning 
to  the  class  of  the  cluster  with  the  nearest  mean.  A 
number  of  other  criteria  are  given  in  [8]. 

The  complete  algorithm  is  outlined  below. 

1.  Make  an  arbitrary  assignment  of  samples  to  clus¬ 
ters. 

2.  Compute  the  sample  mean  of  each  cluster. 

3.  Reassign  each  sample  to  the  cluster  with  the  near¬ 
est  mean. 

4.  If  there  is  no  change  in  classification,  then  stop; 
else  go  to  step  2. 


6  Results 

The  K- Means  algorithm  was  applied  to  the  4- 
dimensional  texture  image  and  the  result  of  the  clas¬ 
sification  is  shown  in  Figure  3.  The  contrast  of  the 
image  has  been  improved  to  make  it  more  visible.  The 
number  of  clusters  chosen  was  15.  The  solid  vertical 
band  along  the  middle  of  the  image  corresponds  to  the 
nadir  and  near-nadir  portion  of  the  seafloor.  Lack  of 
adequate  backscatter  information  from  this  portion  is 
the  cause  for  the  apparent  homogeneity  of  the  seafloor 
near  the  nadir.  The  presence  of  areas  on  the  seafloor 
with  different  textures  is  evident. 


7  Conclusion 

In  this  paper,  we  have  presented  a  novel  approach 
to  seafloor  characterization.  The  results  indicate  that 
texture  analysis  of  bathymetric  sonar  images  is  a  pow¬ 
erful  technique  to  determine  seafloor  roughness  and 
bottom  type.  In  the  absence  of  ground-truth  informa¬ 
tion,  unsupervised  techniques  produced  remarkably 
good  results.  More  work  needs  to  be  done  in  analyzing 
the  texture  features  and  associating  each  cluster  with 
a  type  of  seafloor  surface  (labeling).  We  conclude  that 
texture  analysis  is  a  useful  tool  for  seafloor  mapping. 


Figure  3:  The  texture  image  after  clustering 
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Figure  4:  Sample  co-occurrence  calculations  [14] 
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Figure  o:  The  four  princip.il  components 


